STATS 32: Introduction to R for Undergraduates

Kenneth Tay

Sep 24, 2019

http://web.stanford.edu/~kjytay/courses/stats32-aut2019/

Agenda for today

The big data explosion



What is R?

Ross Ihaka & Rob Gentleman

Why learn R?

Reason #1: R was specifically designed for statistics and data analysis.

Example: Map of 2016 U.S. presidential elections

Example: Spotify Top 100 Songs in 2017

Why learn R?

(Source: stack overflow)

Why learn R?

Reason #3: It’s easy to get started with R.

Stack Overflow

Q&A site for programmers

Packages

Why learn R?

Reason #4: Analyses done in R are reproducible.

R script

R script

# load packages and get dataset
library(ggplot2)
data(mtcars)

# plot of miles per gallon vs. horsepower, colored by no. of cylinders
ggplot(data = mtcars, aes(x = hp, y = mpg, col = factor(cyl))) +
    geom_point() +
    labs(title = "Miles per gallon vs. horsepower")

R markdown: input

R markdown: output

Why learn R?

Reason #5: Community

How about you?

In the next minute, introduce yourself to someone around you!

Course objectives

By the end of this course, students will be able to:

Note: Course is for undergraduates!!

Tentative overview of the course

Class logistics

Class logistics

Assignments

What is a variable?

x <- 3
x <- 3
x <- 3
y <- "abc"
x <- 3
y <- "abc"
x <- 3
y <- "abc"
y <- 5
x <- 3
y <- "abc"
y <- 5
x <- 3
y <- "abc"
y <- 5
x <- y
x <- 3
y <- "abc"
y <- 5
x <- y
x + y   # 5 + 5 = 10

Variable types

Confusion: 123 vs. “123”

How to differentiate between numeric variables and character variables which consist of digits?









Optional material

List of useful packages

Other fun R stuff

R-bloggers

Blog aggregator of content contributed by bloggers who write about R

The R Journal

Bi-annual open-access journal: Features short to medium length articles covering topics of interest to R users and developers

R-exercises

Website with both tutorials and exercises

DataCamp

Website for learning data science, R included (some courses free, some not)